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In this paper, a nonparametric maximum likelihood (ML) estima¬ 
tor for band-limited (BL) probability density functions (pdfs) is pro¬ 
posed. The BLML estimator is consistent and computationally ef¬ 
ficient. To compute the BLML estimator, three approximate algo¬ 
rithms are presented: a binary quadratic programming (BQP) algo¬ 
rithm for medium scale problems, a Trivial algorithm for large-scale 
problems that yields a consistent estimate if the underlying pdf is 
strictly positive and BL, and a fast implementation of the Trivial al¬ 
gorithm that exploits the band-limited assumption and the Nyquist 
sampling theorem ("BLMLQuick”). All three BLML estimators out¬ 
perform kernel density estimation (KDE) algorithms (adaptive and 
higher order KDEs) with respect to the mean integrated squared er¬ 
ror for data generated from both BL and infinite-band pdfs. Further, 
the BLMLQuick estimate is remarkably faster than the KD algorithms. 
Finally, the BLML method is applied to estimate the conditional in¬ 
tensity function of a neuronal spike train (point process) recorded 
from a rat’s entorhinal cortex grid cell, for which it outperforms 
state-of-the-art estimators used in neuroscience. 

probability density | maximum likelihood | band limited | nonparametric | 
estimation | BLML | BLMLQuick | BLML-BQP | BLMLTrivial 

Abbreviations; BLML: Band limited maximum likelihood; pdf: probability den¬ 
sity function; KDE; Kernel density estimation; BQP; binary quadratic program¬ 
ming; MISE; mean integrated squared error; MNLL; mean normalized log-likelihood; 
KDE2nd, KDE6th, KDEsinc; 2nd and 6th order Gaussian and sine Kernel density es¬ 
timators; BLMLQuick, BLML-BQP, BLMLTrivial: BLML quick, trivial and 
BQP algorithms 

Significance Statement 

Nonparametric estimation of probability densities has became 
increasingly important to make predictions abont processes 
where parametrization is difficult. However, unlike paramet¬ 
ric approaches, nonparametric approaches are often not opti¬ 
mal in the sense of maximizing likelihood, which would ensure 
several optimality properties of the estimator. In this study, a 
novel nonparametric density estimation technique that max¬ 
imizes likelihood and that converges to the true pdf is pre¬ 
sented. Simulations show that this technique outperforms and 
outruns sophisticated kernel density estimation techniques, 
and yields the fastest convergence rate to date. 

W hen making inferences from experimental data, it is 
often required to model random phenomena and es¬ 
timate a pdf. A common approach is to assume that the 
pdf belongs to a class of parametric functions (e.g., Gaussian, 
Poisson), and then estimate the parameters by maximizing 
the data likelihood function. Parametric models have sev¬ 
eral advantages. First, they are often efficiently computable. 
Second, the parameters may be related back to physiological 
and environmental variables. Finally, ML estimates have nice 
asymptotic properties when the actual distribution lies in the 
assumed parametric class. However, if the true pdf does not 
lie in the assumed class of functions, large errors may occur, 
potentially resulting in misleading inferences. 

When little is known a priori about the pdf, nonparametric 
estimation is an option. However, maximizing the likelihood 
function yields spurious solutions as the dimensionality of the 


problem typically grows with the number of data samples, 
n [T]. To deal with this, several nonparametric approaches 
penalize the likelihood function by adding a smoothness con¬ 
straint. Such penalty functions have nice properties of hav¬ 
ing unique maxima that can be computed. However, when 
smoothness conditions are applied, the asymptotic properties 
of ML estimates are typically lost [I]. 

Other methods for nonparametric estimation assume that 
the pdf is a linear combination of scaled and stretched versions 
of a single kernel function I3I31I1E]- These methods fall un¬ 
der kernel density (KD) estimation, which have been studied 
for decades. However, choosing an appropriate kernel is still 
a tricky and often an arbitrary process [6] . Additionally, even 
the best KD estimators [SlElllllS] have slower convergence 
rates , Op (n i2/r3^ j-Qj. second and sixth-order 

Gaussian kernels, respectively) than parametric ML estima¬ 
tion (Op(n~^}) with respect to the mean integrated squared 
error (MISE)[T0|. 

Finally, some approaches require searching over nonpara¬ 
metric sets for which a maximum likelihood estimate exists. 
Some cases are discussed in [111112] . wherein the authors con¬ 
struct maximum likelihood estimators for unknown but Lip- 
schitz continuous pdfs. Although Lipschitz functions display 
desirable continuity properties, they can be non-differentiable. 
Therefore, such estimates can be non-smooth, but perhaps 
more importantly, they are not efficiently computable as a 
closed-form solution cannot be derived [111112] . 

This paper presents a case where a nonparametric ML 
estimator exists, is efficiently computable, consistent and re¬ 
sults in a smooth pdf. The pdf is assumed to be band-limited 
(BL), which has a hnite-support in the Fourier domain. The 
BL assumption essentially can be thought of as a smoothness 
constraint. However, the proposed method does not require 
penalizing the likelihood function to guarantee the existence 
of a global maximum, and therefore may preserve the asymp¬ 
totic properties of ML estimators (i.e. consistency, asymptotic 
normality and efficiency). The BLML method is hrst applied 
to surrogate data generated from both BL and inhnite-band 
pdfs, where in both cases it outperformes all tested KD esti¬ 
mators (including higher order kernels) both in convergence 
rate and computational time. Then the BLML estimator is 
applied to the neuronal data recorded from a rat’s entorhinal 
cortex “grid cell” and is shown to outperform state-of-the-art 
estimators used in neuroscience. 
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The BLML Estimator 

We begin with a description of the BLML estimator in the 
following theorem. 

Theorem 1. Consider n independent samples of an unknown 
BL pdf, f{x), with assumed cut-off frequency fc- Then the 
BLML estimator of f{x) is given as: 




sin(7r/e(x - Xj)) 
7r(x — Xi) 


where c = [ci, • • ■ , Cn] and 


c = argmax 


Pn(c)=0 


" 1 

n 4 


Here p„i{c) = ^ CjSij - T Wi = 1, 
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n = 

_A sin(7rfc(xi-xj)) y. . _ - 
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Proof: See supporting information (SI). 


, n and 


The system of equations, pn{c) = 0 in Q is monotonic, i.e., 
> 0, with discontinuities at each a = 0. Therefore, there 
are 2” solutions, with each solution located in each orthant, 
identified by the orthant vector co = sign(c). Each solution 
corresponds to a local maximum of the likelihood function 
which is also its maximum value in that orthant. Hence, the 
global maximum always exists and can be found by finding 
the maximum of these 2” maxima. However, it is computa¬ 
tionally exhaustive to solve which entails finding the 2" 
solutions of Pn(c) = 0 and then comparing values of ]~[ ^ for 
each solution. 

Therefore, to compute the BLML estimator, three approx¬ 
imate algorithms are proposed: a binary quadratic program¬ 
ming (BLML-BQP) algorithm, a BLMLTrivial algorithm, and 
its quicker implementation - BLMLQuick. Both theory and 
simulations show that the BLML-BQP algorithm is appropriate 
when the sample size is n < 100 and no additional knowl¬ 
edge is known other than the pdf is BL. However, in cases 
when n > 100 and the underlying pdf is strictly positive, the 
BLMLTrivial and BLMLQuick algorithms are more appropri¬ 
ate as they are guaranteed to yield a consistent estimate (see 
Theorems 4, 5 and 6) and converge at a rate (~ ^), which is 
faster than all tested KD estimates. Further, the BLMLQuick 
algorithm shows a remarkable improvement in computational 
speed over tested KD methods. 


Consistency of the BLML Estimator. Proving consistency of 
the BLML estimator is not trivial as it requires a solution to 
However, if f{x) > 0 Vx then consistency of BLML 
estimator can be established. To show this, first an asymp¬ 
totic solution Coo to Pti(c) = 0 is constructed (Theorem 3). 
Then, consistency is established by plugging Coo into Q to 
show that the ISE and hence the MISE between the resulting 
density, foo{x), and f{x) is 0 (Theorem 4). Then, it is shown 
that the KL-divergence between foo{x) and f{x) is also 0, and 
hence Coo is a solution to Q, which makes /oo (x) the BLML 
estimator f{x) (Theorem 5). Theorems 2-5 and their proofs 
are presented in SI. 

Generalization of the BLML Estimators to Joint Pdfs. Con¬ 
sider the joint pdf /(x), x G R™, such that its Fourier trans¬ 
form F{(ja) = f /(x)e“^“ ^dx has the element-wise cut off 
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frequencies in vector = 27rf*’'“®. Then the BLML esti¬ 

mator is of the following form: 

/(®) = (^^^CiSincf,(x-Xi)j [3] 

where, fc & R"* is the assumed cutoff frequency, vector x(s i = 
1 • • • n are the data samples, sincf^(x) = a-nd 

the vector c = [ci, ■ • • , c„]^, is given by 

c = argmax (IT . [4] 

Pn{c)~0 \ C / 

Here p„i(c) = CjSij - fff, Sij = sincf^(x.j - Xj). 

The multidimensional result can be derived in a very sim¬ 
ilar way as the one-dimensional result as described in SI. 

Computing the BLML Estimator. The three algorithms, 
BLMLTrivial, BLMLQuick and BLML-BQP are described next. 


BLMLTrivial Algorithm. It is a one-step algorithm that 
first selects an orthant in which the global maximum may 
lie, and then solves Pn(c) = 0 in that orthant. As Pn(c) = 0 
is monotonic, it is computationally efficient to solve in any 
given orthant. 

As stated in Theorem 6 (see SI), the asymptotic solu¬ 
tion of lies in the orthant with indicator vector coi = 
1 Vi = 1, • • • , n if f{x) is BL and f{x) >0 V a; G R. There¬ 
fore, the BLMLTrivial algorithm selects the orthant vector 
Co = ±[1,1, • ■ ■ , 1]^, and then Pn(c) = 0 is solved in that or¬ 
thant to compute c. It is important to note that when f{x) 
is indeed BL and strictly positive, then the BLMLTrivial esti¬ 
mator converges to BLML estimator asymptotically. 

Due to its simplicity, the computational complexity of the 
BLMLTrivial method is very similar to KDE with complexity 
0{nl), where I is the number of points where the value of pdf 
is estimated |13p: but the BLMLTrivial method has an ex¬ 
tra step of solving equation Pn(c) = 0. This equation can be 
solved in 0{n^), using gradient descent or Newton algorithms. 
Therefore, the computational complexity of BLMLTrivial es¬ 
timator is 0{n^ -I- nl). 


BLMLQuick Algorithm. The BL assumption of the true pdf 
allows for a quick implementation of the BLMLTrivial estima¬ 
tor - “BLMLQuick”. For details, see SI. Briefly, BLMLQuick 
first groups the observed samples into bins of size < 
Then, it constructs the BLMLTrivial estimator of the dis¬ 
crete pdf (or the probability mass function, pmf) that gen¬ 
erated the binned data. The true pmf for the binned data 
has infinite-bandwidth. Hence, under the required conditions, 
the BLMLTrivial estimate constructed using the Nyquist fre¬ 
quency, 2/c, converges to the continuous pdf f{x), from which 
the pmf is obtained via sampling. f{x) can be made arbitrar¬ 
ily close to the true pdf f{x) by choosing smaller and smaller 
bins. In fact, if the bin size reduces as then the ISE 

between f{x) and f{x) is of 0{l/n). Therefore, the MISE 
for BLMLQuick is 0{ljn) plus the MISE of the BLMLTrivial 
estimator. Since the MISE of the BLMLTrivial estimator has 
to be greater than 0{l/n), the BLMLQuick algorithm is as 
efficient as the BLMLTrivial algorithm. Specihcally, the com¬ 
putational complexity of BLMLQuick is 

O (n + /2n°-®+2/(”-i) + where ^ governs 

the behavior of the tail of the true pdf. 
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Fig. 1: Comparison of BLMLTrivial and BLML-BQP - In 
this plot, we show the results of BLMLTrivial and BLML-BQP 
algorithms using a non-strictly positive true pdf f{x) = 
0.4sinc^(0.4a:), (A,C) and a strictly positive pdf f{x) = 
(sinc^(0.2a:) + sinc'^(0.2a; -P 0.1)), (B,D). The cut-off fre¬ 
quency was assumed to be /c = ■ The p-values were 

calculated using a paired t-test at n = 81. 


BLML-BQP Algorithm. To derive the BLML-BQP algorithm, it 
is first noted that the 2" solutions of pnic) = 0 are equivalent 
to the 2" local solutions of: 

c = arglocal max( II c^). [5] 

ca’Sc=n2 


here S £ is a matrix with i, j’th element being Sij. Now, 

if Co £ {1,-1}" is an orthant indicator vector and A > 0 is 
such that (Aco)^S(Aco) = n^, then Q implies: 


He? > a"*" 



«Sco)" 


[ 6 ] 


Finally, the orthant where the solution of @ lies is found 

by maximizing the upper bound ^ 2 '^^ using the following 
BQP: 

Co = argmax (cp Sco). [7] 

cosl-i.i}" 


Fig. 2: Comparison of BLML and KD estimation - 

In this plot we compare the results of the BLMLTrivial and 
BLMLQuick estimators to the KDE2nd, KDE6th and sine KD 
estimators. Comparison of the MISE as a function of n for 
(A) a strictly positive band-limited true pdf (the one used in 
Figure [d B) and (B) an infinite band Gaussian normal pdf. 
For theBLML estimators the cut-off frequencies are chosen 
as /c = 2/,(’’“® for the BL true pdf and /c = 2 for the normal 
true pdf. For the KDE2nd and KDE6th, the optimal band- 
widths were chosen as q — and respec- 

Tc fc 

tively and also to match the MISE for the BLML estimator 
for n = 1. For the KDEsinc, the fc was kept the same as 
the fc for BLML estimators. (C) The MISE as a function 
of the cutoff frequency for ^ BL true pdf with cut-off 

frequency ■ n — 10'* was used for creating this plot. 
(D) Computation time as a function of n. The p-values were 
calculated between the BLMLTrivial estimator and other es¬ 
timators using paired t-tests for either logjQ(n) = 5 (A,B,D) 
or log]^o(/c//c'^'*'^) = f-6 (C) ^nd are color coded. 


all nearby orthants is no greater than that of the current or¬ 
thant. The BLML-BQP is computationally expensive, with com¬ 
plexity 0{n^ + nl + BQP{n)) where BQP{n) is the compu¬ 
tational complexity of solving BQP problem of size n. Hence, 
the BLML-BQP algorithm can only be used on data samples 
n < 100. 


BQP problems are known to be NP-hard m, and hence a 
heuristic algorithm implemented in the gurobi toolbox m 
in MATLAB is used to find an approximate solution cq in 
polynomial time. Once a reasonable estimate for the orthant 
Co is obtained, pnic) = 0 is solved in that orthant to find 
an estimate for c. To further improve the estimate, the so¬ 
lutions to Pnic) = 0 in all nearby orthants (Hamming dis¬ 
tance equal to one) of the orthant co are obtained and sub¬ 
sequently ^ is evaluated in these orthants. The neighbor¬ 
ing orthant with the largest ^ is set as cq, and the process 
was repeated. This iterative process is continued until in 


Results 

A comparison of BLMLTrivial and BLML-BQP algorithms on 
surrogate data generated from known pdfs is presented first. 
Then, the performance of the BLMLTrivial and BLMLQuick al¬ 
gorithms is compared to several KD estimators. Finally, we 
show the application results of BLML, KD and GLM methods 
to neuronal spiking data. 

Performance of BLMLTrivial versus blml-bqp on Surrogate 
Data-In FigureBLMLTrivial and BLML-BQP estimates are 
presented assuming that the true pdfs are BL by fc = fc^'^‘^. 
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Panels (A, C) and (B, D) use surrogate data generated from 
a non-strictly positive pdf fx = 0.4 sinc^ (0.4a;) and strictly 
positive pdf f{x) = (sinc"^(0.2a;) + sinc'*(0.2a; + 0.1)), re¬ 

spectively. Both pdfs are BL from (—0.4,0.4). In Panels A 
and B, the BLML estimates (n = 81) are plotted using both 
algorithms, and the true pdfs are overlayed for comparison. In 
Panels C and D, the MISE is plotted as a function of sample 
size n for both algorithms and both pdfs. For each n, data 
were generated 100 times to generate 100 estimates from each 
algorithm. The mean of the ISE was then taken over these 
100 estimates to generate the MISE plots. 

As expected from theory, the BLML-BQP algorithm 
works best for the non-strictly positive pdf, whereas the 
BLMLTrivial algorithm is marginally better for the strictly 
positive pdf. Note that as n increases beyond 100, 
the BLML-BQP algorithm becomes computationally expensive, 
therefore the BLMLTrivial and BLMLQuick algorithms are used 
in the remainder of this paper with the assumption that the 
true pdf is strictly positive. 


BLML and KDE on Surrogate Data. The performance of 
the BLMLTrivial and BLMLQuick estimates is compared with 
adaptive KD estimators which are the fastest known non- 
parametric estimators with convergence rates of 

and 0{n~^) for 2nd-order Gaussian (KDE2nd), 
6th-order Gaussian (KDE6th) and sine (KDEsinc) kernels, re¬ 
spectively [iniin]- Panels A and B of Figure plot the MISE 
of the BLML estimators using the BLMLTrivial, BLMLQuick, 
and the adaptive KD approaches for cases in the presence of 
BL or non-BL pdf, respectively. In the BL case, the true 
pdf is strictly positive and is the same as used above, and 
for the infinite-band case, the true pdf is normal. For the 
BLMLTrivial, BLMLQuick and sine KD estimates, fc = 2/*™® 
and fc = "2 are used for the BL and inhnite-band cases, respec¬ 
tively. For the 2nd and 6th-order KD estimates, the optimal 
bandwidths {q = and q = respectively) 

are used. The constant ensures that MISEs are matched 

for n = 1. 

It can be seen from the Figure that for both the BL and 
infinite-band cases, BLMLTrivial and BLMLQuick outperform 
KD methods. In addition, the BLML estimators seem to 
achieve a convergence rate that is as fast as the KDEsinc, 
which is known to have a convergence rate of 0{n~^). Figure 
C plots the MISE as function of the cut-off frequency fc for 
the BL pdf. BLMLTrivial and BLMLQuick seem to be most sen¬ 
sitive to the correct knowledge of fc, as it shows larger errors 
when fc < which quickly dip as fc approaches 

When fc > the MISE increases linearly and the BLML 

methods have smaller MISE as compared to KD methods. 

Finally, Figure [2p plots the computational time of the 
BLML and KD estimators. All algorithms were implemented 
in MATLAB, and in-built MATLAB 2013a algorithms were 
used to compute the 2nd and 6th-order adaptive Gaussian 
KD and sine KD estimators. The results concur with theory 
and illustrate that BLMLTrivial is slower than KD approaches 
for large number of observations, however, the BLMLQuick al¬ 
gorithm is remarkably quicker than all KD approaches and 
BLMLTrivial for both small and large n. 


BLML Applied to Neuronal Spiking Data. Neurons generate 
action potentials in response to external stimuli and intrinsic 
factors, including the activity of its neighbors. The sequence 
of action potentials over time can be abstracted as a point pro¬ 
cess, where the timing of action potentials or “spikes” carry 
important information. The stochastic point process is char- 
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acterized by a conditional intensity function (GIF), denoted 
as A |17 |. 

Here, the BLML, KD, and GLM methods are applied to 
estimate the GIF of a “grid cell” from spike train data. In the 
experimental set up, the Long-Evans rat was freely foraging in 
an open field arena of radius of Im for a period of 30-60 min¬ 
utes. Gustom microelectrode drives with variable numbers of 
tetrodes were implanted in the rat’s medial entorhinal cortex 
and dorsal hippocampal GAl area. Spikes were acquired with 
a sampling rate of 31.25 kHz and hlter settings of 300 Hz-6 
kHz. Two infrared diodes alternating at 60 Hz were attached 
to the drive of each animal for position tracking. Spike sorting 
was accomplished using a custom manual clustering program 
(Xclust, M.A. Wilson). All procedures were approved by the 
MIT Institutional Animal Gare and Use Gommittee. 

The spiking activity of grid cell is known to be place- 
modulated, whose peak firing locations dehne a grid-like array 
covering much of the 2-dimensional arena. A spike histogram 
of the selected cell as a function of the rat’s position is shown 
in Figure 1^, which plots the {x, y) coordinates of the rat’s 
position when the cell generate spikes (red dots) and the rat’s 
trajectory inside the arena (blue trajectory). The GIF was 
then estimated as a function of the rat’s position (stimuli) 
and the neuron’s spiking history (intrinsic factors): 



where x{t), y{t) is the rat’s position over time inside an arena, 
and the vector Ht, consists of spiking history covariates at 
time f as in [m nil Uni 1111122]. 

Baye’s rule [H] allows one to use nonparametric ap¬ 
proaches to estimate A(-) as follows |24| : 


X{t\x,y,h) 


N f(x,y, /i|spike in time At) 
T f{x,y,h) 


[ 8 ] 


where h{t) = log(time since last spike), N is the total num¬ 
ber of spikes within time interval T, which is the to¬ 
tal duration of the spike train observation. f{x, y, h) and 
/(*, y, h|spike in time At) are densities which are estimated 
using both KDE2nd (higher order kernels were too slow for 
estimation) and BLMLQuick methods. The use of the loga¬ 
rithm allows for a smoother dependence of A on h, which in 
turn allows for capturing high frequency components in the 
GIF due to refractoriness (i.e., sharp decrease in A(t) after a 
spike) and bursting. 

The bandwidths and cut-off frequencies used are qx = 
qy = n~°-^,qh = 0.5n“°'^ and fex = fey = 1.4,/ch = 1.75 
for KDE2nd and BLMLQuick respectively. These bandwidths 
and cut-off frequencies were chosen after testing different com¬ 
binations, and the frequencies and bandwidths that best fits 
the test data (i.e., had the lowest KS statistics, see below) 
for each method were used. Since the rat cannot leave the 
circular arena, the estimates f{x,y,h\spike in time At) and 
f{x,y,h) are normalized to integrate to 1 within the arena. 
The nonparametric estimates of A are also compared with two 
popular GLMs: 


1. Gaussian GLM (GLMgauss) 


2 . 


25 

\og{\{x,y,'H.)) = ai+a2X+a3y+a4,x'^ + azy^ +aQxy+^^ jiihi 

i=l 


Zernike GLM (GLMzern) 


[9] 


25 

\og{\{x,y,'H)) + ^Pihi [10] 

i=l 


where ai,--- , ae, are the parameters estimated from 

the data. H = [hi,-- - ,/i 25 ]^ where hi are the number of 
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Fig. 3: Comparision of BL-MLE(trivial), KDE and GLM methods for grid 
cell data - (a) The trajectory of rat during the duration of 

the experiment is shown by black line. Each dot marks (x,y) 
co-ordinates of rat’s position when the place cell spikes, (b) 
The KS-plot for BLMLTrivial (blue), KDE2nd (purple), GLM 
Gaussian (black) and GLM Zernike (red), along with 95% con¬ 
fidence intervals (dashed ines) using 20% of test data, (c) Es¬ 
timated X{t) using the four methods. All four methods show 
refratoriness and bursting. 


spikes in {t — 2i,t — 2i + 2)ms and S'f'® Zernike polyno¬ 

mials of 3rd order. 

The goodness-of-fit of the four estimates of A are com¬ 
puted using the time rescaling theorem and the Kolmogorov- 
Smirnov (KS) statistic [25]. Briefly, 80% of the data is used 
to estimate A and then the empirical CDF of rescaled spike 
times is computed using the remaining 20% test data, which 
should follow a uniform CDF if the estimate of A is accurate. 
The similarity between the two CDFs is quantified using the 
normalized KS-statistic and visualized using the KS-plot. A 
value of KS> 1 indicates that the estimated A is too extreme 
(p < 0.05) to generate the test data. The closer the normal¬ 
ized KS-statistic is to 0, the better the estimate. 

Figure [^ B, shows the KS-plots for each estimator. 
It is clear from the figure, that the BLMLQuick estimate is the 
only model for which the KS-plot remains inside the 95% con¬ 
fidence bounds with KS= 0.65. The KS for KDE2nd, GLM 
Gaussian and GLM Zernike methods were 1.29, 4.98 and 7.47, 
respectively. 

Finally, Figure [^, plots the GIF estimates of \{t) us¬ 
ing the four methods for a sample period of 400ms. The 
BLMLQuick method allows for a sharper and taller A(t), than 
the GLM and KDE2nd methods and it successfully captures 
the known behaviour of refractoriness and bursting in the 
neuronal activity. Although, in this instance the BLMLQuick 
method outperforms the KD and GLM methods, it is not 
yet clear whether this will always be the case. Several other 
model structures for the GLM methods must be tested, and 
there certainly will be receptive fields of neurons where the 
relationship to external covariates is more Gaussian-like and 



tion of /c. The cons is an arbitrary constant that is added to 
MNLL so that the logarithm of sum could exist. 


the GLM Gaussian or GLM Zernike may do better than 
BLMLQuick. However, for neurons like the grid cell, where the 
receptive field’s dependence on external covariates does not 
follow any particular known profile, nonparametric methods 
like BLMLQuick or KDE methods may be more appropriate. 


Discussion 

In this paper, a nonparametric ML estimator for BL densi¬ 
ties is developed and its consistency is proved. In addition, 
three heuristic algorithms that allow for quick computation 
of the BLML estimator are presented. Although these algo¬ 
rithms are not guaranteed to generate the BLML estimate, 
we show that for strictly positive pdfs, the BLMLTrivial and 
BLMLQuick estimates converge to the BLML estimate asymp¬ 
totically. Further, BLMLQuick is remarkably quicker than all 
tested KD methods, while maintaining convergence rates of 
BLML estimators. Even further, using surrogate data, it is 
shown that both the BLMLTrivial and BLMLQuick estimators 
have an apparent convergence rate of 1/n for MISE, which 
is equal to that of parametric methods. Finally, BLML is 
applied to spiking data, where it outperforms state-of-the-art 
estimation techniques used in neuroscience. 

The BLML estimators may be motivated by quantum me¬ 
chanics. The function g{x) in the development of BLML 
estimate (see SI) is analogous to the wave function [26] in 
quantum mechanics, where the square of the absolute value 
of both are probability density functions. In addition, in quan¬ 
tum mechanics the wave function of momentum is the Fourier 
transform of the wave function of position. Therefore, if the 
momentum wave function has finite support, then the posi¬ 
tion wave function is BL and vice versa. Such occurrences 
are frequent in the single or double slit experiment, where one 
observes bandlimted (sinc^(/ia;) and cos^{f 2 x) respectively) 
profile for the probability of finding a particle at a distance x 
from the center. Also, in the thought experiment of a particle 
in a box: the wave function for position has finite support, 
making the momentum wave function BL. We suspect that a 
large number pdfs in the nature are BL because macro world 
phenomenon are a sum of quantum level phenomenon and 
pdfs at quantum level are shown to be BL (single and double 
slit experiments). Furthermore, the set of BL pdfs is complete, 
i.e. the sum of two random variables that each have a BL pdf 
is a random variable whose pdf is a convolution of original 
pdfs, and hence is BL. Therefore, if macro level phenomenon 
is a linear combination of different quantum level phenomenon 
with BL pdfs, then the macro level phenomenon will also gen- 
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erate a BL pdf. In fact, we see this at macro level where we 
observe Gaussian pdfs of various processes. The Gaussian pdf 
is almost BL, with cutoff frequency /c = 10/cr (< of 

its power lies outside this band). In fact, given hnite data, 
it is impossible to distinguish if the data is generated by a 
Gaussian or BL pdf. 

Choosing A Cut-off Frequency for the BLML Estimator. The 

BLML method requires selecting a cut-off frequency of the 
unknown pdf. One strategy for estimating the true cut-off 
frequency is to first ht a Gaussian pdf using the data via ML 
estimation. Once an estimate for standard deviation is ob¬ 
tained, one can estimate the cut-off frequency using the for¬ 
mula /c = 1 /cr, as this will allow most power of the true pdf to 
lie within the assumed band if the true pdf has Gaussian-like 
tails. 

Another strategy is to increase the assumed cut-off fre¬ 
quency of BLML estimator as a function of the sample size. 
For such a strategy, the BLML estimator may converge even 
when the true pdf has an infinite frequency band, provided 
that the increase in cut-off frequency is slow enough and 
the cut-off frequency approaches infinity asymptotically, e.g. 
LJc oc log(n). 

A more sophisticated strategy would be to look at the 
mean normalized log-likelihood (MNLL), E(—^ as 

a function of assumed cut-off frequency /c. Figure plots 
MNLL (calculated using BLMLTrivial algorithm) is plotted 
for n = 200 samples from a strictly positive true pdf f(x) = 
^^^^^(sinc^(0.2x)-|-sinc^(0.21-1-0.1)) along with Note 

that ~ E(^ Y^^■ CiCjOij), where Oij = cos{fc{xi-Xj)). 

We see that the MNLL rapidly increases until reaches 
after which the rate of increase sharply declines. There is a 
clear “knee” in both MNLL and curves at = /i™=. 
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Therefore, can be inferred from such a plot. A more 

complete mathematical analysis of this “knee” is left for fu¬ 
ture work. 

Making BLMLQuick Even Faster. There are several faster im¬ 
plementation of KD approaches such as those presented in 
[13[ 127) . These approaches use numerical techniques to evalu¬ 
ate the sum of n kernels over I given points. Such techniques 
may also be incorporated while calculating the BLMLQuick es¬ 
timator to make it even faster. Exploration of this idea will 
be done in a future study. 

Asymptotic Properties of the BLML Estimator. Although, 
this paper proves that the BLML estimate is consistent, it 
is not clear whether it is asymptotically normal and efficient 
(i.e., achieving a Cramer-Rao-like bound). Studying asymp¬ 
totic normality and efficiency is nontrivial for BLML estima¬ 
tors as one would need to first redefine asymptotic normal¬ 
ity and extend the concepts of Fisher information and the 
Gramer-Rao lower bound to the nonparametric case. There¬ 
fore, we leave this to a future study. However, we pos¬ 
tulate here that the curvature of MNLL plot might be re¬ 
lated to Fisher information in the BLML case. In addition, 
although under simulations, the BLML estimator seems to 
achieve a convergence rate similar to its parametric counter¬ 
parts {Op{n~^)) it is not proved theoretically. 
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Supporting Information 

Preliminaries and Formulation of the BLML Estimator. Con¬ 
sider a pdf, f{x), of a random variable a; £ R with Fourier 
transform F{uj) = f f(x)e~“^^dx. Let U(a;c) be the set of 
band-limited pdfs with frequency support in {—uJc, 0 Jc), i-e., 
ojc G R is the cut-off frequency of the Fourier transform of the 
pdf. Then, 

U((.jc) = |/ : M IR+ I J f(x)dx = l,F{u}) = 0 V|(.j| 

[SIl] 

Since each element / £ U(a;c) is a pdf, it can be written as 
f(x) = g^{x), where g £ V(a;c) defined as: 

V(c.;c) = {ff : K ^ R I g{x) = ^/J[x), f G U(a;c)} [SI2] 


Proof of Theorem 1 

Proof: Because of is equivalent to 

G{uj) = argmax {L[G]). 

G:H^-C.||G|||=l 


[ 819 ] 


Note that Parseval’s equality [I] is applied to get the con¬ 
straint \\ G\\% = 1. Now, the Lagrange multiplier [2] is used to 
convert Pl9t into the following unconstrained problem: 


G{<jj) = argmax {L[G] + A (l - lie'll^)) • [SIIO] 

G(a;) can be computed by differentiating the above equa¬ 
tion with respect to G using calculus of variations [3] and 
equating it to zero. This gives: 


Finally, W(tJc) can be defined as the set of all Fourier trans¬ 
forms of elements in V(a;c): 

W(cJc) = |g : R C I G{uj) = J g{x)e-''^^dx, g G V(a;c)| 

[SI3] 

Note that since f{x) £ l[J(tJc) is band limited, g{x) £ 
Y{u!c) will also be band limited in ^). Therefore, 

G(a;) = 0 V|a;| > VG £ W{uJc). Finally, V(tJ,) and 
W(cUc) are Hilbert spaces with the inner product defined as 
< a, b >= f a{x)b* {x)dx, < a, b >= ^ J a{uj)b* {uj)duj, re¬ 
spectively. The norm ||o ||2 =< a,a > is defined for both 
spaces. Further, note that for all elements in Y{uJc) and 
W{ujc), ||a||i =< a,a >= 1. 

The Likelihood Function for Band-limited Pdfs Now 

consider a random variable, a; £ R, with unknown pdf f{x) £ 
l[J(u;c) and its n independent realizations a;i,a; 2 , • • • ,Xn. The 
likelihood L{x\, • • ■ , Xn) of observing * 1 , • ■ • ,Xn is then: 


L(a;i, ■. ■ ,a:„) = ]^ f{xi) = 9^(0:^), g G Y{u}f) [SI4a] 

i=l i = l 

= j G{uj)e^‘^^idu?j , G G WK) 

[SI4b] 


Defining: 




0 


V a; £ (— 


2 ’ 


o.w. 



gives: 


[SIS] 


,x„) = l[{< G{oj),b,{io) >f ^ L[G]. [SI6] 


Further, consider G(aj) which maximizes the likelihood 
function: 


G = arg max(L[G]). [817] 

Gew{uc) 

Then the BLML estimator is: 

fix) = J G(uj)e^'^^du^ . [SIS] 



[Sllla] 


Ci = ^ G{Lj),bj{uj) < G{uj),bi{uj) > 

for i = [Slllb] 


To solve fo r c,. t he value of G is substituted back from 


[Sill] a into pmlb and both sides are multiplied by < 

[SI12a] 


G{u>),bi{u!) > to get: 

n 

Ci Cj < bj{u}), bi{Lo) >= n^k for z = 1 ■ • • n, 

= GrJu(±Gs 




i = l \i=l 


[SI12b] 

[SI12c] 


To go from ||SI12|b to ||SI12|c, observe that < bi{uj), bj(cj) > 


(here /. = |j). Now by defining, 




Sll 


^nl 




[SI13] 


and using [Slllja and the constraint ||G(a ;)||2 = 1, one can 
show that Sc = n^. Also, summing up all n constraints in 
| SI12 la gives c^Sc = n^k , hence k = 1/n. Now, substituting 
the value of k into [sn^ a and rearranging terms gives the 
following n constraints: 


1 G ^ 1 

— CjSij -= pni(c.) = 0 for 2 = 1 • • • n. [SI14] 

As mentioned in the main text, the above system of equa¬ 
tions (pn(c) = 0) is monotonic, i.e., > 0, but with discon¬ 

tinuities at each Ci = 0. Therefore, there are 2” solutions, with 
each solution located in each orthant, identified by the orthant 
vector Co = sign{c). Each of these solutions can be found ef¬ 
ficiently by choosing a starting point in a given orthant and 
applying nu merical methods from convex optimization theory 
to solve for png. Thus, each of these 2" solutions corre¬ 
sponds to a local rnaximum of the likelihood functional L[G]. 
The global maximum of L[G] can then be found by evaluating 
the likelihood for each solution c = [ci, ■ • ■ ,c„]^ of piig . 
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The likelihood value at each local maximum can be computed 
efficiently by using the following expression: 


L(c) - n („ ) - n ^2 ■ 


[SI15] 


This expression is derived by substituting |SIll|a into 


|SI6| and then substituting Isnil i nto the result. Now 
the global maximum c can be found by solving Q. Once 
the g l obal maxi mum c is computed, we can put together 
I SIS 1,1 SIS I and |SIll|ato write our solution as Q. Hence 
proved. 


Consistency of the BLML Estimator. 


Bounds on Bandlimited PDF 

In this section the following theorem is first stated and proved. 


Theorem 2. For all f £ U(a;c) f{x) < ^ V* G R. 

Proof: Above theorem can be proven by finding: 

y = max maxf(x). [SI161 

f£V{u:o) 2:6K 

Because a shift in the pdf domain (e.g. f(x — /r)) does not 
change the magnitude or bandwidth of F{uj), without loss of 
generality one can assume that max^igiR f{x) = /(O) and write 
the above equation as 


y = max (/(O)) 

= max (( G(tj)da;) ^ 

GewC^c) \ ^ ) 

= max I I / G(ij)b(u})dw 

IlGIp^i 

= max(^J G(u})b{u))dijjJ +A(||G||^ —1) 



[SI17a] 

[SI17b] 

[SI17c] 

[SI17d] 

[SI17e] 


Here b{uj) = 1 


|w| < tt/c and is 0 otherwise. Now 


by differentiating pI17| |e and subsequently setting the result 
equal to 0, gives G*{uj) = Therefore g*{x) = 

V /c TT V Jc^ 

which gives V = fc= 

Corollary: By the definition of Y{ljJc.), one can apply 
Theorem and show that for all g £ V(aic), g(x) < \/^- 


Sequence Cnj: 

Now a sequence c„j is defined and some of its properties are 
stated and proved. These properties will be used to prove 
Theorems [S] and below. 


Properties of c„j 

Cnj has following properties: 


(PI) 

Onj 

(P2) Cnj = 


n 

1 


a{xj) 


= 9{xj) 


1 + 0 


[SI19a] 


ng^(xj) 


for ng^{xj) > fa 


(■P3) cY = (1 _ c„jg{xj)) 

fc 


{PA) 


3/2 - \/5/2 


/c 


— I \ — \ I j. 


(P5) 0 < 1 - Cnjg{xj) < 1 


[SI19b] 

[SI19c] 

[SI19d] 

[SI19e] 


1 / 1 \ 

(P6) 1 - '^Cnjg{xj) < Oa.s. ( —j= ) if g(x) > Gix [SI19f] 

Vvhy 

(P7) = 

r) ' r) » 


n ^ 


= g{xi) + Eni g{xi) simultaneously Wi if g{x) > OVx 

[SI19g] 


(PS) Cooj = lim Cnj > c„j 'in 


[SI19h] 


Proofs for Properties of c„j 

(PI) can be proved by direct substitution of Cnj into left hand 
side (LHS). (P2) can be derived through binomial expansion 
of Cnj- (P3) can again be proved by substituting Cnj and 
showing LHS=RHS. (P4) and (P5) can be proved by using 
the fact that both and Cnjg{xj) are monotonic in g^{xj) 

since , < 0 and > 0. Therefore, the minimum 

and maximum values of \cj\ and Cjg{xj), can be found in by 
plugging in the minimum and maximum values of q^(xi) (note 
0 < g^ixj) < fa, from Thm^). 

(P6) is proved by using Kolmogorov’s sufficient criterion 
[4] for almost sure convergence of sample mean. Clearly, from 
(P5) E{Cnjg‘^{xj)) < oo which establish almost sure conver¬ 
gence. Now, let ff = ^'^Cnjg{xj). Then multiplying each 
side of n equations in (PI) by respectively, adding them 

and the normalizing the sum by ^ gives: 


-E 


n ^ Cnjg{xj) n 

1 


= i + -E 
< 1 + 


a-njfa 

ng{xj) 


1 + bn 


[SI20a] 

[SI20b] 

[SI20c] 


- A ngjxj) 

"""" 2fa 



4 fc 
n g^ixj) 



VI < J < n 


[SI18a] 


Above bn = J2j S^g(x ) ■ SO from |SI20[ |a to | |SI20[ ]b, 


the result - Y) 

TL ' 


> 


= j (Arithmetic Mean 


jg(xj) — ^Cnjg(xj) 

> Harmonic Mean) is used. Now it can be shown that 
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by first noting that 


bn < Oa.s (;^)i as following: 


g{xi) 


< 


y— 

n( -r. 


n^/n ^ g{xi) 
1 






n \g[xi] 

1 

fn 


[SI21a] 

[SI21b] 

[SI21c] 

[SI21d] 


P>l-Oa.s{^-^ [SI22] 

substituting P in LHS of (P6) proves it. 

To prove (P7) we first establish almost sure convergence 
of each eaquation seprately by using Kolmogorov’s sufficient 
criterion [4]. By Kolmogrov’s sufficient criterion: 


j 

^ ^ji^ij^nj) if ^ ^ 


[SI23a] 

[SI23b] 


lows: 


Thus, now we compute EjiCnjSij) and Ej{c^jslj) as fol- 


sijg(xj) + O fi') -^dxj 
^ ng^(x)>fc V^/ 9(^j) 


= J CnjSijg^{xj)dxj - g{xi) 

-I 

•J nc 

J nc, 

= Sijg{xj)dxj- {1 - Cnjg{xj))sijg{xj)dxj 

J Jng^(x)<fc, 

[SI24c] 


'ng^(x)<fc 


CnjSijg‘^(xj)dxj - g{xi) 


[SI24a] 


[SI24b] 


- f o y) 

Jng‘^(x)'>fc \n/ gi^Xj') 


o ( i 1 ' dxj - g{xi) 


L 


ng^(x)<fc 


y ) \ 


(-) 

[ 

Sij 

\nj 

J ng^(x)>fc 

g{xj) 


da;^ 


[SI24d] 


To go from | SI24 |c to | SI24 |d, the facts that f Sijg{xj)dxj = 
g{xi) for any g G V(a;c) and (P5) are used. Now define 

£n{Xi) = 

^ (n) fng2(x)>fa \g(k) \ kg^ixXfa I ' 

Then it is shown, 

\Ej{cnjSij) - g{xi)\ < e„{xi) 0 uniformly if g(x) > 0 

[SI25a] 


L 


ng^(x)>fa 


g{xj 


dXj < 


kL 


ng'^{x)>fcL 


I dXj, 


and that the length of limit of integration has to be less 


than 


Y as g (x) has to integrate to 1. This 


makes 


Lg 2 (x)>f, I < 0(log(n)) and hence 


O 


0 / 


”■/ Jng2(x)>f^ 


g{xj 


dxj < O 


log(n; 


0 uniformly. 


To go from | SI21 |a to | SI21 |b (P4) and g{x) > 0 are used. 
To go from |SI21|c to | SI21| d, E = / g{xi)dxi is 

used, which has to be bounded as g^{x) is pdf and bandlimited 
(due to Plancherel). Finally the fact that the sample mean of 
positiv e numbers, if c onverg es, con verges almost surely gives 
I SI21 |d. Combining | SI21 |d and | SI20 |c gives: 


Then, J„Y(x)<xkjgi^i)\d^J < LgHx)<x gi^i)dxj 
if g{^) > 0 is also shown to go to zero uniformly, by first 
considering 


A I if g\^i) > k 

' ( 0 o.w. 


1SI261 


The sequence (n{xj) is nondecreasing under the condition 
g^{x) > 0 g^{x) £ ILJ(wc) , i.e (n+ii^j) > Cn{xj) V xj, 

and the lim„_>oo (n(xj) = g(xj). Therefore, by the monotone 
convergence theorem, lim„^oo / (n(xj)dxj = f^^g{xj)dxj. 
This limit converges due to Plancherel. Now, by definition of 

Cn{Xj), 


lim / \sijg{xj)\dxj < 

Jng'^(x)<f, 

fc / g{xj)dxj — lim /c / („{xj)dxj —>■ 0 uniformly. 

/ n —^oo / 

J —oo «/ 

[SI27a] 

Therefore en{xi) —> 0 uniformly V®i which is equivalent to 
saying maxxe{x) —» 0. A we aker bu t more informative proof 
for going to step to ]SI24[ |d can be obtained by as¬ 

suming a tail behaviour of for g{x) and showing the step 


is shown: 


holds for all r > 1, this gives £n(xi) = O Now i 


/■ 


< / sfjdxj = fc < ooMxi 


[SI28a] 

[SI28b] 


To go from [SI28|a to |SI28|b, (P5) and the eq uality 
f sfjdx j = fc are invoiced. Finally, substituting pmllf and 
| SI28 |binto | SI23 |b proves that each equation go to zero al¬ 
most surely but separately. More precisely, untill now only it 
has been shown that there exists sets of events E±,E 2 , - ■ ■ , E„ 
where each set Ei = {s ■. lim„_>oo Pni(c(s)) = 0} and 
P{Ei) = 1. However to establish simultaniety of convergence 
we need to show P(n“i?i) = 1. 

For this, the almost sure convergence of the following 
norm: 


J y Cnjs{x — Xj) — g{x)J dx '^4' 0 if g{x) > 0 

[SI29] 

is established in next section. This implies that 
n s{x — Xj) ^4' g{x) uniformly due to bandlimitedness of 
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functions m- This in turn implies that eqns ^ Cnjs{xi — 
Xj) g{xi) simultaneously for all Xi and hence prove (P7). 
(P8) can be proved easily by showing that > 0 Vn. 


Proof for ISI29 | 

To establish convergence of norm consider: 


= f ^niCnjs{x - Xi)s{x -Xj)+ (x) 

- 2 ^ Cnjs{x - Xj)g{x)dx 

j 


n ^ 

3 


[SI30a] 

[SI30b] 


— 2 'y ^ ^^niCnjSjj + ^ ^ ^ ^ ^ ^ 

^7^3 » 3 


[SI30c] 


— 9 y ^ ^ni^nj^ij H" ^ ( 1 ^ ^ > ^rLj9(.^j ) 


*7^i 




[SI30d] 

[SI30e] 


To go from | SI30||c to ||SI30[|d to |]SI30||d P3 and P6. 
For going to [|S130||e the almost sure convergence proof is 
established in next sectiorfl 

Now, E{cniCnjSij) is Calculated as: 


E{CniCnjSij) = j CniCnjSijQ^ {Xi)g^ {Xj)AXiAXj 


[SI31c 


I Cn^gH Xi){g{xi) + enixi))dxi [SI31b] 

: J Cnig^(Xi)dXi + J ( 


= / Cnig ixt)dxi + / Cmg {xi)e„{xi)dxi 

[SI31c] 

= 1 + 01^^^ ^ 


is) 


+ maxi 


(£^(* 0 ) J \9{xi)\dxi 

[SI31d] 


if g{x) > 0 


[SI31e] 


To go fr om | S I3l|a to |SI31|b [SI25| is used. To go 
from IjSIS^c to |Sl^|d (P6) and (P5) are used. To go 
from | |SI3l| d to mn e uniform convergence of e„{x) and 
g(x) < (XJ (d ue to piancheral) ar e used . Now, combining 
ISI311 e and |SI30|e establishes |SI29| and subsequently 
almost simultaneous convergence. 
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Proof for Almost Sure Convergence of 

Let Sn = ^ ^2i^j CniCnjSij^ then: 


, , 4n(n — l)(n — 2) _2 - \ 

V QiT yhn) — ^ [CniCrLj^nlSijSjTri) 


+ _11 E (c^ s^ ) 


[SI32a] 


2n(n — 2)(2n — 3) 


E {CniCnjCnlCnm^ij Slm) 


4n(n — l)(n — 2)fc 


il 


g{xi)dx. 


2n(n - 1) IP /_2 - I SN 2 X 

—TT -^—E - c„jg(xj))sij) 

Jc^ 

2n(n — 2){2n — 3) 


E (^CniCnj | 5^ | ) 


= c I - 

n 


[SI32b] 

lSI32cl 


To go from |]SI32 |a to | SI32|b f\sijSjm\ < fc (cauchy- 
schwartz inequality), P5, P3 are used. To go from |]SI32 |b 
to [SI32|c J g{x) < (XJ (due to piancheral), P5 and 
J \sijg{xi)\ < vTc (cauchy-schwartz inequality) are used. 

Now, by chebyshev inequality Pr(|S „2 — g\ > e) < O (:^), 
here g = limn-s-oo PllSn). Hence, Pr(|S „2 - g\ > 

e) < (XJ, therefore by Borel-Cantelli lemma, S „2 g. 

Now to show Sn g, divide Sn into two parts An — 
i2'^i^j9niCnjSijI(sij) where I{sij) is indicator function 
which if 1 is Sij > 0 and 0 otherwise (note that c„i > 0 Vi due 
to the assumption g{x) > 0), and Bn = Sn — An- Now, 

T_ /. s 4n(n — l)(n — 2) _2 - i n i\ 

Var[An) < - ^ - E [CniCnjCnl\Sij\\Sjm\) 

+ (cL 44) [SI33a] 

2n(n - 2)(2n - 3) IP ,,,,,, i. n. o 

^ \CniCnjCnlCnm\^ij\\Slm\) 


< 


4n(n — l)(n — 2)/c 


J g{xi)dxi 


+ ^ ( 4(1 - Cnjg{Xj))\Sijf) 


= - 
. n 


lSI33bl 


lSI33cl 


To go from | |SI33| a to | |SI33[ ]b f\sijSjm\ < fc (cau chy- 
schwartz inequality)) P5, P3 are used. To go from [|SI32 |b to 
| |SI32[ ]c / g{x) < (XJ due to piancheral, P5 and f \sijg{xi)\ < 
3 /J 2 (cauchy-schwartz inequality) are used. Therefore, again 
by chebyshev inequality and Borel-Cantelli lemma A „2 14' 
lim„_>oo E{An)- Now, consider integer k such that < n < 
{k -I- 1)^, as rrAn is monotonically increasing (by definition) 
this implies: 


k^ 


A^.2 s An S - ^ -A{fc + 1)2 


(fc-f 1)2 

lim E{An) < An < lim E(An) 


[SI34a] 


lSI34bl 


Now by sand witch theorem An 


limn—xoo E{An), simi¬ 


larly it can be shown that Bn A-' limii_>cx) 
; E{Sn)- Hence proved. 


E{Bn) and hence 
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Now, Theorems 1^ and 1^ are proven. 

Proof for Consistency of the BLML Estimator 

Theorem 3. Suppose that the observations, Xi for i = 

are i.i.d. and distributed as Xi ~ g^{x) £ l[J(a;c). Then, 

Cooi = lim„-»oo ('\/^ + n g'^li) ~ ® solution to 

Pn{c) — 0 in the limit as n — ^ 00 . 

Proof: To prove this theorem, we establish that any equation 
Pni(cn), indexed by i goes to 0 almost surely as n —>■ 00 as 
follows: 


E(c„js{x — Xj)) — g{x) = 0, which establishes the point-wise 
convergence of to 0. Hence, lim can be safely replaced 

by lim inf and Fatou’s lemma can be applied. To go from 
| lSl37| ]e to pl3^ f, is used. 

Hence proved. 

Theorem 5. Suppose that the observations, Xi for i = l,...,n 
are i.i.d. and distributed as Xi ~ f(x) G U(a;c). Then, the 
KL-divergence between f{x) and faoix) is zero and hence Coa 
is the solution of in the limit n —^ 00 . Therefore, the 
BLML estimator f{x) = foo{x) = f{x) in probability. 

Proof: Consider {xi,- • • ,Xn} to be a member of typical set 
[^. Then the KL-divergence between f{x) and foo{x) can be 
bounded as: 


/— \ 1 - . Cnifc t w 1 

Pni (Cn ) — / Cij Cnj 3 V 2 — 1, ■ ■ ■ ,n 

n n c„i 


g{xi) - g{xi) = 0 Vi = 1, 


[SI35a] 

[SI35b] 


In moving from [ |SI35[ |a to JSI35| 
[SI35|b, show that each of the pni 
ability. Therefore, 


b (PI) and (P7) are used. 
Pni(c„) Vi goes to 0 in prob- 


lim p„i(c„) = 0 Vi = 1, • • • ,n [SI36] 

n—>-oo 


This proves Theorem Note that one may naively say 
that lim„_>oo c„i = V i = 1, ■ • • ,n (see (P2)). However, 


this is not true because even for large n there is a hnite prob¬ 
ability of getting at least one g{xi) which is so small such 
that — 27 —r may be hnite, and hence lim„_>oo Cni cannot be 

Tig / 


calculated the usual way. Therefore, it is wise to write down 
Cooi — lim„_»oo Cni as a solution to |SI14|, instead of 


Theorem 4. Suppose that the observations, Xi for i = l,...,n 
are i.i.d. and distributed as Xi ~ f{x) € U(tJc) and f{x) > 


0 Vx. Let, fcx>{x) = lim„^oo(^Z)" 


sin (tt/ c (cc — j; .j) ) 


TTiX — X 


)■ 


then J {f{x) — foo{x))^ da; = 0. 

Proof: Let gaa{x) = lim„_>oo kY^"^^iCaais{x - Xi) here 
s{x - Xi) = Therefore the ISE is: 

^ Tr{x — Xi) 


ISE^ j {gl,ix) - g\x)y dx [SI37a] 

= j { 9 {x) - goo{x))^ {goo{x) + g(x))'^ dx [SI37b] 

< 4/c j(goc.(x) - g{x))^dx [SI37c] 

= 4/c f lim ( — N Cnislx — Xi) — g(x)] dx [SI37d] 
J n-ioo \n J J J 

<4/climinf / ( — N c„,s(a; — x,) — g(a:) ) dx [SI37e] 

n-^oo J \n ) 

“4' 0 [SI37f] 


_ ^c, the inequalit 

is used (see Theorem 
|SI37|d, goo{x) is expanded. 


li^ 

i- 


(gix) + 
To go 
To go from 

I SI37 |d to [ SI37 |e, Fatou’s lemma [5] is invoked as the func¬ 
tion inside the integral is non-negative and measurable. In 
particular, due to (P6), 4>n{x) = - '^Cnjs{x — Xj) — g{x) '4' 


To go from |SI37|b to |SI37| 
g(x))'^ < df c if g^ e 
from |c to 


0 < E 




lim — 

n—^oo Tl 


X^log 


2 ^ 

< lim - Vlog(|p(xi)cooi|) 

n—^oo 71 

i=l 

< 0 


(IM\ 

Xgloixi)) 

[SI38a] 

[SI38b] 

[SI38c] 


To go from plM|L a to 
used. To go from 


b, dehnition of g^o and P7 is 


SI38| b to JSI38 |c, (P5) is used. 


Therefore, the KL divergence between faa{x) and the true 
pdf is 0 and hence foa{x) should also maximizes the likelihood 
function. Finally, c = Coo or c = —Coo. The negative solution 
can be safely ignored by limiting only to positive solutions. 
Hence Proved. 


Theorem 6. If g^{x) = f{x) G U(a;c) such that f{x) > 0 V a; G 
R, then g{x) >0 V x G R, and the asymptotic solution of 0 
lies in the orthant with indicator vector coi = 1 Vi = 1, • ■ • , n. 

Proof: g G Y{u)c) as g^ G l[J(a;c). Therefore g{x) is bandlim- 
ited and hence continuous. Now, assume that 3 xi,X 2 G R 
such that g{xi) > 0 and g(x 2 ) < 0. Due to continuity of 
g this would imply that 3 xa, xi < xs < X 2 such that 
gixs) = /(xa) = 0. This is a contradiction as fix) > 
0 Vx G R. Therefore, either (/(x) < 0 V x G R or equiv¬ 
alently gix) > 0 V X G R. Now, by Theorems and 
coi = sign(ci) = sign(cooi) = sign(g(xi)) = 1 Vi = 1 • • • n 
asymptotically. Hence proved. 


Generalization of BLML estimator to Joint Pdfs. BLML esti¬ 
mator for joints pdfs can be found in a very similar way as it is 
found for one d imensional pdfs. The only change occurs while 
defining | SIS |, where one needs to dehne multidimensional 
b's such that 


[ 11 ] 


inverse Fourier transform of which gives a multidimensional 
sincf^(-) function. 
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BLMLQuick Algorithm. Consider a function f{x) such that: 

f{x) = fs fir)dr [SI39] 

where / £ U(a;c) and fa > 2/c is the sampling frequency. It is 
easy to verify that f{x) is also a pdf and / £ l[J(tJc). Now con¬ 
sider samples f[p] = f{p/ fa), clearly these samples are related 
to f{x) as: 

P+05 

flP] = I " f{x)dx [SI40] 


Further consider xfs computed by binning from Xi’s, n 
i.i.d observations of r.v. x ~ f{x), as: 

Xi = fsl^-0.5\ [SI41] 

Js 

where [J is the greatest integer function. Note that Xi 
are the i.i.d. observations from f{x) = f[p]S “ 7~\ 

Now since fa > 2/c, the BLML estimate for f{x) should con¬ 
verge to f{x) due to Nyquist’s Sampling Theorem [B]. This 
estimator is called BLMLQuick. Assuming that the rate of con¬ 
vergence for BLML is 0{n~^), then if fa is chosen such that 
11/ “ /Hi = the BLMLQuick will also converge with 

0{n~^). This happens at fa = > fc also fa > 2/c if 

n > 16. 

Implementation and Computational Complexity 

Before implementing BLMLQuick and computing its computa¬ 
tional complexity, the following theorem is first stated and 
proved. 

Theorem 7. Consider n i.i.d observations {xi}"^i of random 
variable x with pdf having j/pr tail. Then 

Pr ^min({3:i}Li) < - j ~ 1 - e"'~ e 

[SI42] 

for large n. 

Proof'. For n i.i.d observations {xi}"^i of random variable 
X with cumulative distribution function F{x), it is well known 
that : 

Pr(min({a:i}7=i) < a:) = 1 — (1 — F{x))^ [SI43a] 

~ 1 - VF{x) < 0.5 [SI43b] 

~ 1 _ e"(r-i)/r-i \/F{x) < 0.5 

[SI43c] 
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substituting x = — ^ (c/i)e ) ^ ^ above proves the result. 

Finally, due to duplicity in Xi i = 1,..., n, they can be 
written concisely as [xi,,nb\, b — 1,... ,B where Xb are unique 
values in Xi and Ub is duplicity count of Xb- Now it can be 

observed that B < (max(a;i) — min(a;i))/s < Op(n'^-^fa, if 
the true pdf has tail that decreases as -j/p (Theorem ItI. 

Now we are set to implement BLML quick using f(Mowing 
steps: 

• Compute from {xi}"^i. Computational com¬ 

plexity of 0 {n). 

• Sort {xb,nb}f^i and construct S : Sai, = s{xa — Xb) ^ a, b = 
1,..., B and S = S * diag{{nb}f.^i). Note that S is block- 
Toeplitz matrix (Toeplitz arrangements of blocks and each 
block is Toeplitz) [7]. Computational complexity of 0(5^). 

• Use convex optimization algorithms to solve Pn{c) = 0. 
Newton’s method should take a finite number of iterations 
to reach a given tolerance e since the cost function is self 
concordant [8]. Therefore, the computational complexity 
of optimization is same as the computational complexity of 
one iteration. The complexity of one iteration is the same 
as the complexity of calculating 

(^diag {{l/cl}b^^ -I- S x diag [SI44a] 

= (diag ({l/{clnb)}b=i'^ + sj diag 

[SI44b] 

As diag ({l/(cjni,)}^]^) -|- S is also block-Toeplitz struc¬ 
ture, the Akaike algorithm [7] can be used to evaluate each 
iteration of Newton’s method in 0{B^). 

Note: Simulations show that S can be approximated ac¬ 
curately (to machine accuracy) by a low rank matrix e.g., 
R = 20 for B — 1000, therefore the inversion can be per¬ 
formed in 0{R? + RB). Further, in some cases one may 
end up with a large B (e.g. if true pdf has heavy tails) 
so that storing the Hessian matrix becomes expensive. In 
such cases, a quasi Newton or gradient descent can be used 
which compute BLML estimator fairly quickly. 

• Evaluate BLMLQuick estimate f{x) = {^'^f^i'abCbs{x — 
Xb))^ at I given points. Computational complexity of 
0{BI). 

The total computational complexity is 0{n + B^ + IB). 
Substituting B < O (n^^^ fa<0 , gives 

the total computational complexity 
o(n + . 
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